Temporal Difference Learning for Nondeterministic Board Games
نویسندگان
چکیده
We use temporal difference (TD) learning to train neural networks for four nondeterministic board games: backgammon, hypergammon, pachisi, and Parcheesi. We investigate the influence of two variables on the development of these networks: first, the source of training data, either learner-vs.self or learner-vs.-other game play; second, the choice of attributes used: a simple encoding of the board layout, a set of derived features, or a combination of these. Experimental results show that the TD learning approach is viable for all four games, that learner-vs.-self play can provide highly effective training data, and that the combination of raw and smart features allows for the development of stronger players.
منابع مشابه
Learning to Play Board Games using Temporal Difference Methods
A promising approach to learn to play board games is to use reinforcement learning algorithms that can learn a game position evaluation function. In this paper we examine and compare three different methods for generating training games: (1) Learning by self-play, (2) Learning by playing against an expert program, and (3) Learning from viewing experts play against themselves. Although the third...
متن کاملSelf-Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning
A promising approach to learn to play board games is to use reinforcement learning algorithms that can learn a game position evaluation function. In this paper we examine and compare three different methods for generating training games: 1) Learning by self-play, 2) Learning by playing against an expert program, and 3) Learning from viewing experts play against each other. Although the third po...
متن کاملEvolving Small-board Go Players Using Coevolutionary Temporal Difference Learning with Archive
We apply Coevolutionary Temporal Difference Learning (CTDL) to learn small-board Go strategies represented as weighted piece counters. CTDL is a randomized learning technique which interleaves two search processes that operate in intra-game and inter-game mode. The intra-game learning is driven by gradient-descent Temporal Difference Learning (TDL), a reinforcement learning method that updates ...
متن کاملEvolving small-board Go players using coevolutionary temporal difference learning with archives
We apply Coevolutionary Temporal Difference Learning (CTDL) to learn small-board Go strategies represented as weighted piece counters. CTDL is a randomized learning technique which interweaves two search processes that operate in the intra-game and inter-game mode. Intra-game learning is driven by gradient-descent Temporal Difference Learning (TDL), a reinforcement learning method that updates ...
متن کاملGOjen: tdGo Temporal Difference Learning of Go Playing Artificial Neural Networks
The original project description has been: An existing Java application handling and visualizing Go games between human and computer players (including trained and evolved ANNs) should be improved and extended with Go playing ANNs trained by temporal difference learning. This extension should serve as a basis for comparisons of td learning with conventional ANN training and evolutionary methods...
متن کامل